Apple Heart Study Interpretation

ND320 C4 L0 04 Apple Heart Study Interpretation

Summary

I think this is a landmark study in the use of wearables for clinical research, but I don’t think we should overvalue that 34% of participants were confirmed to have AF or how the Apple Watch's use as a screening or diagnostic tool. The reason for this is because the study deviated from typical clinical trials that try to prove the effectiveness of devices or drugs in a few significant ways:

Not everyone received an ECG chest patch, so we don’t know the false-negative rate.
The inclusion criteria (Apple Watch and iPhone users) created a biased study population.

If wearable research is biased towards a more affluent population, the interventions or discoveries made may be more specifically tailored for this population and may be less effective on a less affluent population. This is similar to why researchers think long and hard about how to enroll minority populations in their clinical trials.

Despite the caveats above, this study broke new ground in significant ways:

Recruited almost half a million people for the study in 8 months.
These types of studies lose a lot of participants to follow-up (e.g., over 2000 participants were notified of having irregular pulse rates but only collected data from 450 ECG chest patches.)
How participants would react to a smartwatch notifying them of an abnormal heart rhythm while it was happening.

I think the study is super exciting in that it's a first attempt to use the capability of wearables to do long term monitoring in a medical context. There’s obviously a lot of hype around studies like this and I want us to be excited about the right things.

Check for understanding

SOLUTION:

0.1%

Precision and Recall

Classification Accuracy

From activity classification to AF detection, many tasks in wearable healthcare involve classification. In the case of the AHS, participants were classified by the “irregular pulse” notification into AF and non-AF groups. One way of discussing classification accuracy in binary classification is by looking at precision and recall.

precision: the proportion of all true positive cases that are detected positive by the classifier
recall: the proportion of all positives detected by the classifier that are true positives

This is demonstrated graphically below:

precision recall image

Precision and Recall <br> Source: Walber. Precision and recall. Nov 2014. CC-BY-SA-4.0 [Link](https://commons.wikimedia.org/wiki/File:Precisionrecall.svg) — Precision and Recall
Source: Walber. Precision and recall. Nov 2014. CC-BY-SA-4.0 Link

QUIZ QUESTION::

In the AHS study, 404 participants got a new diagnosis of AF from 929 who had an irregular pulse notification. And 3070 participants got a new diagnosis of AF from 293,015 who didn't get an irregular pulse notification. If the irregular pulse notification is used to classify participants that will and will not receive an AF diagnosis, what is the precision and recall? (Round to the nearest %)

ANSWER CHOICES:

Classification	Percentage
Precision
Recall

SOLUTION:

Classification	Percentage
Recall
Precision

Further Research

Wearables are part of Digital Health, learn more about it here.

The wikipedia page for precision and recall

The Framingham Study is a landmark study in cardiovascular health. Many interesting papers came out of it, and I highly recommend exploring them. Start here with the Wikipedia page

If you would like to read up on the AHS on your own, you can find the clinical trial study record detail here.

Relevant Papers

Apple Heart Study - Perez MV, Mahaffey KW, Hedlin H, et al. "Large-scale assessment of a smartwatch to identify atrial fibrillation." N Engl J Med 2019;381:1909-1917. Link.
Apple Heart Study Response - Campion Edward W., Jarcho John A.. (2019) "Watched by Apple." N Engl J Med 381:20, 1964-1965. Link.
Framingham Study - Wolf PA, Abbott RD, Kannel WB. "Atrial fibrillation as an independent risk factor for stroke: the Framingham Study." Stroke 1991;22:983-988. Link.
Wearable Health -
Montgomery, K., Chester, J., & Kopp, K. (2018). "Health Wearables: Ensuring Fairness, Preventing Discrimination, and Promoting Equity in an Emerging Internet-of-Things Environment." Journal of Information Policy, 8, 34-77. doi:10.5325/jinfopoli.8.2018.0034 Link

New Vocabulary

Inclusion Criteria: Characteristics that potential study subjects must have for them to be included in the study.
Exclusion Criteria: Characteristics that disqualify potential study subjects from participating in a clinical study. e.g., Some common ones are being under 18 or pregnant.
Classification Accuracy: A metric for evaluating the performance of a classifier -- the fraction of classifications that are correct. For rare events (like atrial fibrillation), this metric is unsuitable. For example, a classifier that classifies every data point as healthy would have a classification accuracy of 99%, as around 1 percent of the population has atrial fibrillation, but would be relatively useless.
Precision: The fraction of positive classifications that are correct.
Recall: The fraction of positive elements that are classified correctly as positive.